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SPEECH CODING SYSTEM WITH A MUSIC CLASSIFIER 

10 Inventors: 

Adil Benyassine 
Huan-Yu Su 

15 Background OF THE Invention 

1. Technical Field 

This invention relates generally to digital coding systems. More particularly, 
this invention relates to classification systems for speech coding. 

20 2. Related Art 

Telecommunication systems include both landline and wireless radio systems. 
Wireless telecommunication systems' use radio frequency (RF) communication. 
Currently, the frequencies available for wireless systems are centered in frequency 
ranges around 900 MHz and 1900 MHz, The expanding popularity of wireless 

25 communication devices, such as cellular telephones is increasing the RF traffic in 

these frequency ranges. Reduced bandwidth communication would permit more data 
and voice transmissions in these frequency ranges, enabling the wireless system to 
allocate resources to a larger number of users. 

Wireless systems may transmit digital or analog data. Digital transmission, 

30 however, has greater noise immunity and reliability than analog transmission. DigitaF' ! 

transmission also provides more compact equipment and the ability to implement 
sophisticated signal processing functions. In the digital transmission of speech 
signals, an analog-to-digital converter samples an analog speech waveform. The 
digitally converted waveform is compressed (encoded) for transmission. The encoded 
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signal is received and decompressed (decoded). After digital-to-analog conversion, 
the reconstructed speech is played in an earpiece, loudspeaker, or the like. 

The analog-to-digitai converter uses a large number of bits to represent the 
analog speech waveform. This larger number of bits creates a relatively large 
5 bandwidth. Speech compression reduces the number of bits that represent the speech 

signal, thus reducing the bandwidth needed for transmission. However, speech 
compression may result in degradation of the quality of decompressed speech, In 
general, a higher bit rate results in a higher quality, while a lower bit rate results in a 
lower quality. 

10 Modem speech compression techniques (coding techniques) produce 

decompressed speech of relatively high quality at relatively low bit rates. One coding 
technique attempts to represent the perceptually important features of the speech 
signal without preserving the actual speech waveform. Another coding technique, a 
variable-bit rate encoder, varies the degree of speech compression depending on the 

1 5 part of the speech signal being compressed. Typically, perceptually important parts of 

speech (e.g., voiced speech, plosives, or voiced onsets) are coded with a higher 
number of bits. Less important parts of speech (e.g., unvoiced parts or silence 
between words) are coded with a lower number of bits. The resulting average of the 
varying bit rates can be relatively lower than a fixed bit rate providing decompressed 

20 speech of similar quality. These speech compression techniques lower the amount of 

bmdwidth required to digitally transmit a speech signal. 

These low bit rate speech coding systems may provide suitable speech quality. 
However, the coded signal quality typically is unacceptable for music due to the low 
bit rate typically used by speech codecs for this type of signal. Music may be 

25 provided by a service or similar feature for playing music while a party is waiting. A 

radio, stereo, other electronic equipment, a live performance, and the like also may 
provide music when in proximity for transmission by a communication system. 

If a music signal is to be transmitted, the speech coding system should switch 
to higher bit rates to accommodate the music signal. However, current speech coding 

30 systems do not effectively classify when a music signal is present. Typically, a voice 

activity detector (VAD) is used to differentiate speech and music from noise. 
However, a VAD does not effectively differentiate between speech and music. As a 

2 
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result, most music signals are transmitted at lower bit rates or a combination of lower 
and higher bit rates. 

Summary 

The invention provides a speech coding system with a music classifier that 
5 provides a classification of an input or speech signal. The classification may be the 

input signal is noise, speech, or music. The music classifier analyzes or determines 
signal properties of the input signal. The music classifier compares the signal 
properties to thresholds to determine the classification of the input signal. 

In one aspect, the speech coding system with a music classifier comprises an 

10 encoder disposed to receive an input signal. The encoder provides a bitstream based 

upon a speech coding of a portion of the input signal. The speech coding has a bit 
rate. The encoder provides a classification of the input signal. The classification 
comprises at least music. The encoder adjusts the bit rate in response to the 
classification of the input signal 

15 In a method of classifying music in speech coding system, one or more first 

signal parameters are determined in response to an input signal. The first signal 
parameters are compared to at least one noise threshold. When the first signal 
parameters are not beyond the noise threshold, the input signal is classified as noise. 
When the first signal parameters are beyond the noise threshold, one or more second 

20 ^ signal parameters are determined in response to the input signaL The second signal 
parameters are compared to at least one music threshold. When the second signal 
parameters are beyond the music threshold, the input signal is classified as speech. 
When the second signal parameters are not beyond the music threshold, the input 
signal is. classified as music. 

25 Other systems, methods, features and advantages of the invention will be or 

will become apparent to one wilh skill in the art upon examination of the following 
figures and detailed description. It is intended that all such additional systems, 
methods, features and advantages be included within this description, be within the 
scope of the invention, and be protected by the accompanying claims, 

30 
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Brief Description of the Drawings 

The invention can be better understood with reference to the foUowing figures. 
The components in the figures are not necessarily to scale, emphasis instead being 
5 placed upon illustrating the principles of the invention. Moreover, in the figures, like 

reference numerals designate corresponding parts throughout the different views. 

Figure 1 is a block diagram of a speech coding system having a music 
classifier. 

Figure 2 is a flowchart showing a method of classifying music in a speech 
10 coding system. 

Detailed Description of the Preferred Embodiments 

FIG. 1 is a block diagram of a speech coding system 100 with a music 
classifier. The speech coding system 100 includes a first communication device 102 
operatively connected via a communication medium 104 to a second communication 

15 device 106. The speech coding system 100 may be any cellular telephone, radio 

fi-equency, or other telecommunication system capable of encoding a speech signal 
118 and decoding it to create synthesized speech 120. The communication devices 
102 and 106 may be cellular telephones, portable radio transceivers, and . other 
wireless or wireline communication systems. Wireline systerns may include Voice 

20 Over Internet Protocol (VoIP) devices and systems. 

The communication medium 104 may include systems using any transmission 
mechanism, including radio waves, infrared, landlines, fiber optics, combinations of 
transmission schemes, or aiiy other medium capable of transmitting digital signals. 
The communication medium 104 may also include a storage mechanism including a 

25 memory device, a storage media or other device capable of storing and retrieving 

digital signals. In use, the communication medium 104 transmits digital signals, 
including a bitstream, between the first and second communication devices 102 and 
106. 

The first communication device 102 includes an analog-to-digital converter 
30 108, a preprocessor 110, and an encoder 112. Although not shown, the first 

communication device 102 may have an antenna or other communication medium 
interface (not shown) for sending and receiving digital . signals with the 

4 
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communication medium 104. Tfie first communication device 102 also may have 
other components known in the art for any communication device. 

The second communication device 106 includes a decoder 114 and a digital- 
to-analog converter 116 connected as shov^n. Although not shown, the second 
5 communication device 106 may have one or more of a. synthesis filter, a 

postprocessor, and other components known in the art for any communication device. 
The second communication device 106 also may have an antenna or other 
communication medium interface (not shown) for sending and receiving digital 
signals with the communication medium 104. ' 

10 The preprocessor 110, encoder 112, and/or decoder 114 may comprise 

processors, digital signal processors, application specific integrated circuits, or other 
digital devices for implementing the algorithms discussed herein. The preprocessor 
110 and encoder 112 also may comprise separate components or a same component. 
In use, the analog-to-digital converter 108 receives an input or speech signal 

15 118 firom a microphone (not shown) or other signal input device. The speech signal 

may be a human voice, music, or any other analog signal. The analog-to-digital 
converter 108 digitizes the speech signal, providing a digitized signal to the 
preprocessor 110. The preprocessor 110 passes the digitized signal through a high- 
pass filter (not shown), preferably with a cutoff fi-equency of about 80 Hz. The 

20 preprocessor 110 may perform other processes to improve the digitized signal for 

encoding. 

The encoder 1 12 segments the digitized speech signal into frames to generate 
a bitstream. In one embodiment, the speech coding system 100 uses frames having 
160 samples and corresponding to 20 milliseconds per frame at a sampling rate of 
25 about 8000 Hz. The encoder 112 provides the frames via a bitstream to the 

communication medium 104. 

In one embodiment, the encoder 112 comprises a music classifier (not shown), 
which may have a voice activity detector (not shown). The music classifier provides 
a classification of the digitized signal in each frame. The classification may be that 
30 the input or speech signal is noise, speech, or music. The music classifier may use a 

voice activity detector (VAD) to differentiate speech and music firames from noise 
fi"ames. The music classifier further differentiates speech frames from music frames. 

5 
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In one aspect, the music classifier analyzes or determines the signal properties of the 
digitized signal. The signal properties may include one or more of pitch gain, spectral 
differences, frame energy, and other suitable properties for differentiating between 
music and speech. The music classifier compares the signal properties to thresholds 
5 to determine whether a frame is music or speech. The music classifier also may have 

one or more counters or may use one or more running means of the signal properties 
to provide a confidence level of the determination. The running means and counters 
may extend over a time period that covers multiple frames. The time period may be 
about 640 milliseconds. 

10 The decoder 114 receives the bitstream from the communication medium 104. 

The decoder 114 operates to decode the bitstream and generate a reconstructed speech 
signal in the form of a digital signal. The reconstructed speech signal is converted to 
an analog or synthesized speech signal 120 by the digital-to-analog converter 116. 
Hie synthesized speech signal 120 may be provided to a speaker (not shown) or other 

1 5 signal output device. 

The encoder 112 and decoder 114 use a speech compression system, 
commonly called a codec, to reduce the bit rate of the noise-suppressed digitized 
speech signal. There are numerous algorithms for speech codecs that reduce the 
number of bits required to digitally encode the original speech or digitized signal 

20 while attempting to maintain high quality reconstructed speech. The code excited 

* linear prediction (CELP) coding technique utilizes several prediction techniques to 
remove redundancy from the speech signal. The CELP coding approach is frame- 
based. Sampled input speech signals (i.e., the preprocessed digitized speech signals) 
are stored in blocks of samples called frames. The frames are processed to create a 

25 compressed speech signal in digital form. 

The CELP coding approach uses two types of predictors, a short-term 
predictor and a long-term predictor. The short-term predictor is typically applied 
before the long-term predictor. The short-term predictor also is referred to as linear 
prediction coding (LPC) or a spectral representation and typically may comprise 1 0 

30 prediction parameters, A first prediction error may be derived from the short-term 

predictor and is called a short-term residual. A second prediction error may be 
derived from the long-term predictor and is called a long-term residual. The long- 

6 
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term residual may be coded using a fixed codebook that includes a plurality of fixed 
codebook entries or vectors. During coding, one of the entries may be selected and 
multiplied by a fixed codebook gain to represent the long-term residual. The long- 
term predictor also can be referred to as a pitch predictor or an adaptive codebook and 
5 typically comprises a lag parameter and a long-term predictor gain parameter. 

The CELP encoder 112 performs an LPC analysis to determine the short-term 
predictor parameters. Following the LPC analysis, the long-term predictor parameters 
and the fixed codebook entries that best represent the prediction error of the long-term 
residual are determined. Analysis-by-syntiiesis (ABS) is employed in CBLP coding. 

1 0 In the ABS approach, synthesizing with an inverse prediction filter and applying a 

perceptual weighting measure find the best contribution fi-om the fixed codebook and 
the best long-term predictor parameters. 

The short-term LPC prediction coefficients,, the adjusted fixed-codebook gain, 
as well as the lag parameter and the adjusted gain parameter of the long-term 

15 predictor are quantized. The quantization indices, as well as the fixed codebook 

indices, are sent from the encoder to the decoder. 

The CELP decoder 114 uses the fixed codebook indices to extract a vector 
from the fixed codebook. The vector is multiplied by the fixed-codebook gain, to 
create a fixed codebook contribution. A long-term predictor contribution is added to 

20 the fixed codebook contribution to create a synthesized excitation that is commonly 

referred to simply as an excitation. The long-term predictor contribution comprises 
the excitation fi-om the past multiplied by the long-term predictor gain. The addition 
of the long-term predictor contribution altematively comprises an adaptive codebook 
contribution or a long-term pitch filtering characteristic. The excitation is passed 

25 through a synthesis filter, which uses the LPC prediction coefficients quantized by the 

encoder to generate synthesized speech. The synthesized speech may be passed 
through a post-filter that reduces the perceptual coding noise. Other codecs and 
associated coding algorithms may be used, such as adaptive multi rate (AMR), 
extended code excited linear prediction (eX-CELP), selectable mode vocoder (SMV), 

30 multi-pulse, regular pulse, harmonic based, transform based, and the like. 

Figure 2 shows a method of classifying music in speech coding. In 240, a 
speech signal is digitized. An analog-to-digital converter or other suitable digitizing 

7 
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device may be used to digitize the signal. In 242, one or more first signal parameters 
are determined for a frame or portion of the digitized signal. The portion may include 
a sub-frame, half-frame, or the like. The first signal parameters may comprise a 
noise-to-signal ratio, frame energy, and other parameters useftil to determine whether 
5 the frame contains noise. In 244, the first signal parameters are compared to one or 

more noise thresholds. The noise thresholds may be selected to classify a frame as 
noise when the digitized signal is all noise, mostly-noise, or another level of noise and 
speech. A voice activity detector (VAD) or similar device may be used to determine 
and compare the signal parameters with the noise' thresholds. The VAD may provide 

10 a detection of both or either of active speech and/or inactive speech. Active speech 

may comprise music and speech. Inactive speech may comprise noise. In 246, a 
noise determination is made to determine whether the digitized signal in the frame is 
noise. If the signal parameters are not beyond the noi$e thresholds, the digitized 
signal and the frame are classified in 248 as noise and a noise frame, respectively. If 

15 the first signal parameters are beyond the noise thresholds, the digitized signal may be 

speech or music. 

In 250, one or more second signal parameters are determined for the frame. In 
252, the second signal parameters are compared to one or more music thresholds. The 
second signal parameters and music thresholds are further described below. The 
20 music thresholds may be selected to classify a frame as music when the digitized 

signal is all music, mostly-music, or another level of music and speech. The music 
thresholds also may be selected to classify a frame as speech when the digitized signal 
is all speech, mostly-speech, or another level of music and speech. 

In 254, a music determination is made to determine whether the digitized 
25 signal in tiie frame is music, Ttie music determination may be to determine whether 

the digitized signal in the frame is speech. If the second signal parameters are beyond 
the music thresholds, the digitized signal and the frame are classified in 256 as speech 
and a speech frame, respectively. If the signal parameters are not beyond the music 
thresholds, the digitized signal and frame are classified in 258 as music and a music 
30 frame, respectively. 

The music classifier may classify the input or speech signal as either music or 
speech. This determination or classification may take place after the noise frames are 

8 
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classified. The music classifier may use some of the first signal parameters and 
extracts the second signal parameters fi-om the speech signal. These parameters are 
compared to music thresholds to determine whether the input signal is music or 
speech. While certain signal parameters are described, other or additional signal 
5 parameters may be used to determine whether the input signal is music or speech. 

The music classifier has a buffer of the five previous normalized pitch 
correlations, corr^(0. An lsf(2) and an are obtained from the linear 

prediction coding, LPC, analysis. The line spectral frequencies, Isf, are 
transformations of LPC parameters (the short term filter coefficients). The /^/are 

10 obtained by decomposing the inverse transfer function A(z) to a set of two transfer 

functions — one having even symmetry and the other having odd symmetry. The Isf 
are the roots of these transfer functions (polynomials) on a z-unit circle, A(z) models 
an inverse fi'equency response of a vocal tract. A difference A;,y. between Isf {2) and 
IsfQ) is computed. 

15 A running mean of Isf (1) is computed as: 

=0.75.^(l)+0.25./5/(l) 

A running mean energy, £ , is calculated as: 

£=0,75.^ + 0,25 
where E is the frame energy, 
20 A spectral difference 50 is calculated as: 

where k^^ is the running mean reflection coefficients of noise/silence. 

Hie running mean of the partial residual is updated along when the 
input VAD is inactive as: 
25 EU' -{'O.l^E'^ 

and 

(0 = 0.75 • k^ (0 + 0.25 . k(i) r= 1, ... ,10 
A running mean of the normalized pitch correlation is given by: 



9 



wo 02/065457 



PCT/US02/01847 



corrp ^O.S'COfTp +0.2- -'^corr^(i) 

A periodicity flag F^is calculated using corr^(;)md different music 
thresholds. A spectral continuity counter c,^ is incremented if A:(2)^0.0 and 
coiTp <0.5 and reset to 0 otherwise. A periodicity continuity counter c^^ is 
incremented each time is set and reset to 0 every 32 frames. 

A running mean of the periodicity counter c^^ is updated every 32 frames as: 

where 



0,98 c^,>12 
0,95 c^,>10 
0.90 otherwise 



10 A counter c^^^ tracks the behavior of Cp^, c^^^ is incremented each time c^^ is 

0 and is reset otherwise. 

A very low frequency noise flag Fj- is set if the initial VAD is inactive and 

either &/(l)< 110 Hertz or /is/"(l) < 150 Hertz, The initial inactive VAD decision 
from the VAD module may be corrected to an active VAD decision by comparing 
15 • , JE***, Ej^ , jB, and c^^ to a set of thresholds. A noise continuity counter is 

incremented each time the corrected VAD is inactive and is reset otherwise. 



A ruiming mean of the normalized pitch correlation corrp is updated if either 



the corrected VAD is inactive or Fj- is set. The normalized pitch correlation corrp 
essentially tracks the normalized pitch correlation during noise/silence: 



20 corrp = 0.8 • corrp + 0.2 



1 

--J^corrpQ) 



v5 



A music continuity counter is adaptively incremented and decremented by 
comparmg the signal parameters to each other and to a set of music thresholds, 
controlled by the various flags. The music counter , the other counters, and other 



10 
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parameters may be modified, determined, or otherwise obtained through one or more 
statistical analysis of the input or speech signal. 

A running mean of this counter c,^ is updated as: 

5 The music detection flag is set if either c^^ ^ 18 or cj^^ > 200 , In this case, 

is reset to 0, Cp,, c^^, and c,^ are reset to 0 if either 13dBor /?} is set or 
c^^ > 50 , or > 20. and are set to 0 if C;^ > 50 , 

Another method of classifying music in speech coding utilizes the following 
computer code, written in the C programming language. The C programming 
1 0 language is well known to those having skill in the art of speech coding and speech 

processing. The following C programming language code may be performed within 
the 250, 252, and 254 ofFigure 2. 

MLLenergy= 0.75 *MLLenergy + 0.25 *LLenergy ; 
15 dif_dvector(mrc,rc,tmp_vec,0,NP-l); . 

dot_dvector(tmp_vec,tmp_vec,&SD, 0,NP-1); 

if(*Vad = = NOISE) 

{ .. . 

20 * . MeanSE = 0.9*MeanSE + 0.1*Lenergy; 

wad_dvector(mrc,0.75,rc,0.25,mrc,0,NP-l); • . 
} 

sum2 =0.0; 
25 for(i = 0;i<5;i++) 

sum2 +« pgains[i]; 

sum2 = sum2/5.0; 
if (LLenergy < 10.0) 
30 sum2 =MrN(pgains[3], pgains[4]); 



11 
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MeanPgain = 0.8*MeanPgain + 0.2*sum2; 

if(MeanPgain>0.63) 
5 PFLAG2 = 1; 

else 

PFLAG2 = 0; 

if ( std < 1.30 && MeanPgain > 0.45 ) 
10 PFLAGl =1; 

else 

PFLAGl =0; 

PFLAG= (INT16) ( ((INT16)prev_vad && (INT16) (PFLAGl 1 1 PLAG2)) 
1 1 (INT16) (PLAG2)) 



15 



20 



if (rc[l] >= 0.0 && MeanPgain < 0.5) 
count_consc_rflag++ 

else 

count_consc_rflag = 0; 

if(PFLAG = =l) 

count_pflag-H-; 



if ((frm_count%(64/2)) = =0 ) 
25 { 

if ( frm_count = = 64/2) 

Mcount_pflag = (FLOAT64) count jjflag; 

else 

{ 

30 if (count_pflag > 25/2) 

Mcount_pflag = 0.98*Mcountj)flag + 
0.02*(FLOAT64)count_pflag; 

12 
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else if (count_pflag > 20/2) 

Mcountjpflag « 0.95*Mcountj)flag -t- 
0.05*(FLOAT64)countjflag; 
else 

5 Mcountjpflag = 0.90*Mcount_j)flag + 

0.10*(FLOAT64)countj)flag; 
} 

} 

if (count_pflag = = 0) 
10 count_conscj)flag-H-; 
else 

count_consc_pflag = 0; 

vlow_freg_noise = 0 
15 If ( (*Vad --NOISE) && (Isfl) < 110.0/8000.0 1 1 

(MAX (lsfD,mlsfO) < 150.0/8000.0) )) 
vlow_freq_noise = 1 ; 

if ( MLLenergy < 13.0 1 1 vlow_freqLnoise = = 1 1 1 
20 count_consc_j)flag > 50 | f count_consc_rflag > 20) 

{ 

Mcount__pflag = 0.0; 
count_consc_pflag = 0; 
count_consc_rflag = 0; 
25 } 

if ((flm_count%(64/2)) = =0) 
count_pflag = 0; 

30 if (SD > 0. 1 5 && (Lenergy MeansSE) > 4.0 && (LLenergy> 50.0) ) 

*Vad = VOICE; 

else if ((SD > 0.38 1 1 (Lenergy - MeansSE)> 4.0 ) && (LLenergy> 50.0)) 

13 
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*VAD =VOICE; 
else if (Mcountjpflag >= 1 1.0) 
*Vad =VOICE; 

if (*Vad= = NOISE) 

count_consc_nflag-H-; 

else 

count_consc_nflag = 0; 
if ( count_consc_nflag > 50) 

{ 

mus_update = 0; 
mean_mus_update = 0.0; 

} 

if(MLLenergy< 13.0) 

mus_update = MAX (0, mus_update - 10); 
else if (*Vad = = NOISE 1 1 vlow_freq_noise = = 1) 

{ 

NMeanPgain = 0.8*NMeanPgain + 0.2*sum2; 
if ( vlow_freq_noise = = 1 1 1 (NMeanPgain < 0.55 && 
(( (Lenergy - MeansSE)< 2.0 ) 1 1 

(MeanPgain < 0.43 && SD < 0.050) ))) 

mus_update = SiAX(0, nius_update - 100); 

} 

else if (rc[l] < 12.8*delta_lsf -0.8 1 1 MeanPgain > 0.667*rc[l] + 1.2667) 
{ 

diffl = 12.8*delta_lsf -0.8 - rc[l]; 

diff2 = MeanPgain - 0.667*rc[l] - 1.2667; 

mus_update = MAX(0, mus_update-1000*MAX(diffl,diff2)); 
} 

14 
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10 



15 



else if ((Lenergy -MeanSE)> 4,0) 
{ 

if (NMeanPgain > 0 J5 && mrc[l] < 0.55) 

mus_update= MIN(mus_update+l 00,32767); 

else 

mus_update~ MIN(mus__update+l ,32767); 

} 

mean_mus_update = 0,9*mean_mus_update + 0.1*mus_update; 



if ((Mcountjpflag >= 18.0) 1 1 mean_mus_update > 200,0) 
{ 

music_flg=l; 



MeanSE == 0.0; 
} 

else 

20 music^flg =0; 



retum(music_flg); 



15 
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The variables in the computer code correspond to the variables in the method 
associated with Figure 2 as shown in Table 1, 

Table 1 



i^Gscnpiion Variaoies 


iw^-coae variaoies 


E 


LLenergy 


E 


MLLenergy 


k 


Rc 


k 


Mrc 


SD 


SD 




jvLeanors 




Lenergy 


corTp 


Pgains 


COTTp 


MeanPffain 


17 


PFT AG 




count_consc_rflag 








iVXV/VJ Ull t pXlq.^ 




count_consc_pflag 




vlow freci nr»i<5e 




count_consc_nflag 




rausic_update 


N 

COTTp 


NMeanPgain 




deltajsf 


IsfH) 


IsfO 




mlsfO 
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After a frame or portion of the input or speech signal is classified as music or a 
music frame, the speech coding of the music frame may be done at higher bit rates to 
accommodate the music signal. In an alternate embodiment, the speech coding of the 
music frame is done to reduce or essentially eliminate music from the synthesized 
5 speech signal. In one aspect, an essentially zero gain is applied to a codevector 

representing a signal waveform of the music frame. 

The embodiments discussed in this invention are discussed with reference to 
speech signals, however, processing of any analog signal is possible. It also is 
understood the numerical values provided may be converted to floating point, decimal 

10 point, fixed point, or other similar numerical representation that may vary without 

compromising functionality. Further, functional blocks identified as modules are not 
intended to represent discrete structures and may be combined or further sub-divided 
* in various embodiments. Additionally, the speech coding system may be provided 
partially or completely on one or more Digital Signal Processing (DSP) chips. The 

15 DSP chip may be programmed with source code. The source code may be first 

translated into fixed point, and then translated into a programming language that is 
specific to the DSP. The translated source code then may be downloaded into the 
DSP. One example of source code is the C or C-H- language source code. Other 
source codes may be used. 

20 While various embodiments of the invention have been described, it will be 

apparent to those of ordinary skill in the art that many more embodiments and 
implementations are possible that are within the scope of this invention. Accordingly, 
the invention is not to be restricted except in light of the attached claims and their 
equivalents. 

25 
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What is claimed is: 

1. A speech coding system with a music classifier, comprising: 

an encoder disposed to receive an input signal, the encoder to provide a 
bitstream based upon a speech coding of a portion of the input signal, the speech 
5 coding having a bit rate; 

where the encoder provides a classification of the input signal, where 
the classification comprises at least music; and 

where the encoder adjusts the bit rate in response to the classification 
of the input signal 

10 2. The speech coding system according to Claim I, where the speech 

coding comprises code excited linear prediction (CELP). 

3. The speech coding system according to Claim 1 , where the speech 
coding comprises extended code excited linear prediction (eX-CELP). 

4. The speech coding system according to Claim 1, where the 
15 classification comprises one of noise, speech, and music. 

5. The speech coding system according to Claim 4, further comprising a 
voice activity detector (VAD), the VAD to provide a detection of at least one of active 
speech and inactive speech. 

6. The speech coding system according to Claim 1, where the portion of 
20 the input signal is one of a firame, a sub-frame, and a half frame. 

7. The speech coding system according to Claim 1, where the encoder 
comprises a digital signal processing (DSP) chip. 

8. The speech coding system according to Claim 1 , fiirther comprising a 
decoder operatively connected to receive the bitstream from the encoder, the decoder 

25 to provide a reconstructed signal based upon the bitstream. 
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9. TTie speech coding system according to Claim 1, where the encoder 
compares at least one signal parameter to at least one threshold to determine the 

• classification of the input signal. 

10. The speech coding system according to Claim 1 , where the at least one 
5 signal parameter comprises at least one of a frame energy, line spectral frequencies, a 

spectral difference^ a partial residual, a normalized pitch correlation, and at least one 
counter, 

11. The speech coding system according to Claim 1, where the at least one 
counter comprises at least one of a spectral continuity counter, a periodicity continuity 

1 0 counter, a noise continuity counter, and music continuity counter. 

12. The speech coding system according to Claim I, where at least one of 
the at least one signal parameter comprises a running mean. 

13. A method of classifying music in speech coding system, comprising: 
determining at least one first signal parameter in response to a input 

15 signal; 

comparing the at least one first signal parameter to at least one noise 

threshold; 

when the at least one first signal parameter is not beyond the at least 
one noise threshold, classifying the input signal as noise; 
20 when the at least one first signal parameter is beyond the at least one 

noise threshold, determining at least one second signal parameter in response to the 
input signal; 

comparing the at least one second signal parameter to at least one 
music threshold; 

25 when the at least one second signal parameter is beyond the at least 

one music threshold, classifying the input signal as speech; and 

when the at least one second signal parameter is not beyond the at least 
one music threshold, classifying the input signal as music. 
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14. The method of classifying music according to Claim 13, where the at 
least one first signal parameter comprises at least one of a noise to signal ratio and a 
frame energy. 

15. The method of classifying music according to Claim 13, where the at 
5 least one second signal parameter comprises at least one of a frame energy, line 

spectral frequencies, a spectral difference, a partial residual, and a normalized pitch 
correlation, 

16. The method of classifying music according to Claim 1 5, where the at 
least one second signal parameter further comprises at least one counter, 

10 17. The method of classifying music according to Claim 1$, where the at 

least one counter comprises at least one of a spectral continuity counter, a periodicity 
continuity counter, a noise continuity counter, and music continuity counter. 

18. The method of classifying music according to Claim 15, where at least 
one of the at least one second signal parameter comprises a running mean. 

T 

15 19. The method of classifying music according to Claim 16, further 

comprising resetting the at least one counter in response to the at least one threshold. 
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